Differential gene expression in Montipora capitata rice coral

a bite-size comparison of two groups

Sarah Tanja

4/26/23

Project Goal

Build a reproducible script for differential gene expression (DGE) analysis of Montipora capitata rice coral samples using a reference genome

Borrowed Data

M. capitata development diagram by A. Huffmyer

Methods

What I need to do…

  • Transcript Quantification (Alignment) with HISAT2

  • Differential Expression Analysis HISAT2& DESeq2

But I got confused and used kallisto

04-create-kallisto-index

And the result was this count matrix that I don’t trust:

                                            SRR22293447 SRR22293448 SRR22293449
Montipora_capitata_HIv3___RNAseq.g16177.t2a           0           0           0
Montipora_capitata_HIv3___TS.g32517.t1                0           0           0
Montipora_capitata_HIv3___RNAseq.g13366.t1            0           0           0
Montipora_capitata_HIv3___RNAseq.g226.t1              0           0           0
Montipora_capitata_HIv3___RNAseq.g39492.t1           66          66          72
Montipora_capitata_HIv3___TS.g4674.t2b               13           8           8
                                            SRR22293450 SRR22293451 SRR22293452
Montipora_capitata_HIv3___RNAseq.g16177.t2a           0           3           1
Montipora_capitata_HIv3___TS.g32517.t1                0           0           0
Montipora_capitata_HIv3___RNAseq.g13366.t1            0           0           0
Montipora_capitata_HIv3___RNAseq.g226.t1              0           0           0
Montipora_capitata_HIv3___RNAseq.g39492.t1           78          82          67
Montipora_capitata_HIv3___TS.g4674.t2b                3           2           6
                                            SRR22293453 SRR22293454
Montipora_capitata_HIv3___RNAseq.g16177.t2a           6           2
Montipora_capitata_HIv3___TS.g32517.t1                0           0
Montipora_capitata_HIv3___RNAseq.g13366.t1            0           0
Montipora_capitata_HIv3___RNAseq.g226.t1              0           0
Montipora_capitata_HIv3___RNAseq.g39492.t1          103          65
Montipora_capitata_HIv3___TS.g4674.t2b                7           1

DESeq Negative Binomial

The DESeqDataSet object, named deseq.dds, is then passed to the DESeq() function to fit a negative binomial model and estimate dispersions.

deseq2.dds <- DESeq(deseq2.dds)
deseq2.res <- results(deseq2.dds)
deseq2.res <- deseq2.res[order(rownames(deseq2.res)), ]
Formal class 'DESeqResults' [package "DESeq2"] with 7 slots
  ..@ priorInfo      : list()
  ..@ rownames       : chr [1:54384] "Montipora_capitata_HIv3___RNAseq.10136_t" "Montipora_capitata_HIv3___RNAseq.10187_t" "Montipora_capitata_HIv3___RNAseq.10207_t" "Montipora_capitata_HIv3___RNAseq.10214_t" ...
  ..@ nrows          : int 54384
  ..@ elementType    : chr "ANY"
  ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. ..@ rownames       : NULL
  .. .. ..@ nrows          : int 6
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  .. .. ..@ listData       :List of 2
  .. .. .. ..$ type       : chr [1:6] "intermediate" "results" "results" "results" ...
  .. .. .. ..$ description: chr [1:6] "mean of normalized counts for all samples" "log2 fold change (MLE): condition recruits vs embryos" "standard error: condition recruits vs embryos" "Wald statistic: condition recruits vs embryos" ...
  ..@ metadata       :List of 6
  .. ..$ filterThreshold: Named num 1.04
  .. .. ..- attr(*, "names")= chr "63.21844%"
  .. ..$ filterTheta    : num 0.632
  .. ..$ filterNumRej   :'data.frame':  50 obs. of  2 variables:
  .. .. ..$ theta : num [1:50] 0.327 0.34 0.353 0.365 0.378 ...
  .. .. ..$ numRej: num [1:50] 5868 5868 5917 5950 6005 ...
  .. ..$ lo.fit         :List of 2
  .. .. ..$ x: num [1:50] 0.327 0.34 0.353 0.365 0.378 ...
  .. .. ..$ y: num [1:50] 5854 5887 5920 5953 5986 ...
  .. ..$ alpha          : num 0.1
  .. ..$ lfcThreshold   : num 0
  ..@ listData       :List of 6
  .. ..$ baseMean      : num [1:54384] 13.45 2.99 2768.01 46.87 7.97 ...
  .. ..$ log2FoldChange: num [1:54384] 4.442 0.107 -0.943 0.43 2.357 ...
  .. ..$ lfcSE         : num [1:54384] 0.826 1.175 0.104 0.243 0.64 ...
  .. ..$ stat          : num [1:54384] 5.3774 0.0912 -9.0699 1.7665 3.6819 ...
  .. ..$ pvalue        : num [1:54384] 7.56e-08 9.27e-01 1.19e-19 7.73e-02 2.31e-04 ...
  .. ..$ padj          : num [1:54384] 9.34e-07 9.61e-01 4.78e-18 1.83e-01 1.48e-03 ...
[1] 5736    6

Plotting

PCA

Heatmap

Top 50 Differentially Expressed Genes

Log2 Fold Change

Volcano Plot

Next Steps

Week06

  • Go back and rebuild index file using HISAT2 & align reads

Week07

  • Run Differential Expression Analysis using HISAT2& DESeq2

Week08

  • Debug all the inevitable mistakes

  • Clean up scripts, add helpful tips

Week09-10 If there’s time…

  • Go back and build scripts to work with raw sequence data:

    • Quality control with FastQC or MultiQC

    • Trim & Clean with fastp or trimgalore

    • Quality control trimmed sequences

Currently stuck on…

Create HISAT2 index for the reference genome hisat2-build path/to/reference_genome.fasta path/to/hisat2_index/genome


#| code-line-numbers: 2
/home/shared/hisat2-2.2.1/hisat2-build \
-f ../data/Montipora_capitata_HIv3.assembly.fasta \
../output/Mcapv3.index